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D 

Abstract 

The classical problem of quickest change detection is studied with an additional constraint on the 
cost of observations used in the detection process. The change point is modeled as an unknown constant, 
and minimax formulations are proposed for the problem. The objective in these formulations is to find 
a stopping time and an on-off observation control policy for the observation sequence, to minimize 
a version of the worst possible average delay, subject to constraints on the false alarm rate and the 
fraction of time observations are taken before change. An algorithm called DE-CuSum is proposed and 
is shown to be asymptotically optimal for the proposed formulations, as the false alarm rate goes to zero. 
Numerical results are used to show that the DE-CuSum algorithm has good trade-off curves and performs 
significantly better than the approach of fractional sampling, in which the observations are skipped using 
the outcome of a sequence of coin tosses, independent of the observation process. This work is guided 
by the insights gained from an earlier study of a Bayesian version of this problem. 

I. Introduction 

In the problem of quickest change detection, a decision maker observes a sequence of random variables 
{X n }. At some point of time 7, called the change point, the distribution of the random variables changes. 
The goal of the decision maker is to find a stopping time r on the {X n }, so as to minimize the average 
value of the delay max{0, r — 7}. The delay is zero on the event {t < 7}, but this event is treated as a 
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false alarm and is not desirable. Thus, the average delay has to be minimized subject to a constraint on 
the false alarm rate. This problem finds application in statistical quality control in industrial processes, 
surveillance using sensor networks and cognitive radio networks; see |TJ, (2J, (3j. 

In the i.i.d. model of the quickest change detection problem, the random variables {X n } for n < 7 
are independent and identically distributed (i.i.d.) with probability density function (p.d.f) /o, and {X n } 
for n > 7 are i.i.d. with p.d.f. f\. In the Bayesian version of the quickest change detection problem the 
change point 7 is modeled as a random variable T. 

In Q, (5J the i.i.d. model is studied in a Bayesian setting by assuming the change point T to be 
a geometrically distributed random variable. The objective is to minimize the average detection delay 
with a constraint on the probability of false alarm. It is shown that under very general conditions on 
/o and /1, the optimal stopping time is the one that stops the first time the a posteriori probability 
P(T < n\Xi, ■ ■ ■ ,X n ) crosses a pre-designed threshold. The threshold is chosen to meet the false alarm 
constraint with equality. In the following we refer to this algorithm as the Shiryaev algorithm. 

In (6J, (7J, (8J, (9j, 1 10 1, 1 11 1, no prior knowledge about the distribution on the change point is 
assumed, and the change point is modeled as an unknown constant. In this non-Bayesian setting, the 
quickest change detection problem is studied in two different minimax settings introduced in f6j and 
j7|. The objective in |6j - fTT| is to minimize some version of the worst case average delay, subject 
to a constraint on the mean time to false alarm. The results from these papers show that, variants of 



the Shiryaev-Roberts algorithm [12], the latter being derived from the Shiryaev algorithm by setting the 
geometric parameter to zero, and the CuSum algorithm (13), are asymptotically optimal for both the 
minimax formulations, as the mean time to false alarm goes to infinity. 

In many applications of quickest change detection, changes are infrequent and there is a cost associated 
with acquiring observations (data). As a result, it is of interest to study the classical quickest change 
detection problem with an additional constraint on the cost of observations used before the change point, 
with the cost of taking observations after the change point being penalized through the metric on delay. 
In the following, we refer to this problem as data-efficient quickest change detection. 



In [14 1, we studied data-efficient quickest change detection in a Bayesian setting by adding another 
constraint to the Bayesian formulation of (4). The objective was to find a stopping time and an on-off 
observation control policy on the observation sequence, to minimize the average detection delay subject 
to constraints on the probability of false alarm and the average number of observations used before the 
change point. Thus unlike the classical quickest change detection problem, where the decision maker 
only chooses one of the two controls, to stop and declare change or to continue taking observations, in 
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the data-efficient quickest change detection problem we considered in [14], the decision maker must also 
decide - when the decision is to continue - whether to take or skip the next observation. 

For the i.i.d. model, and for geometrically distributed Y, we showed in fl4| that a two-threshold 
algorithm is asymptotically optimal, as the probability of false alarm goes to zero. This two-threshold 
algorithm, that we call the DE-Shiryaev algorithm in the following, is a generalized version of the 
Shiryaev algorithm from Q. In the DE-Shiryaev algorithm, the a posteriori probability that the change 
has already happened conditioned on available information, is computed at each time step, and the change 
is declared the first time this probability crosses a threshold A. When the a posteriori probability is below 
this threshold A, observations are taken only when this probability is above another threshold B < A. 
When an observation is skipped, the a posteriori probability is updated using the prior on the change 
point random variable. We also showed that, for reasonable values of the false alarm constraint and the 
observation cost constraint, these two thresholds can be selected independent of each other: the upper 
threshold A can be selected directly from the false alarm constraint and the lower threshold B can be 
selected directly from the observation cost constraint. Finally, we showed that the DE-Shiryaev algorithm 
achieves a significant gain in performance over the approach of fractional sampling, where the Shiryaev 
algorithm is used and an observation is skipped based on the outcome of a coin toss. 

In this paper we study the data-efficient quickest change detection problem in a non-Bayesian setting, 
by introducing an additional constraint on the cost of observations used in the detection process, in 
the minimax settings of (6J and (7J. We first use the insights from the Bayesian analysis in [14| to 
propose a metric for data efficiency in the absence of knowledge of the distribution on the change point. 
This metric is the fraction of time samples are taken before change. We then propose extensions of the 
minimax formulations in (6} and (7J by introducing an additional constraint on data efficiency in these 
formulations. Thus, the objective is to find a stopping time and an on-off observation control policy to 
minimize a version of the worst case average delay, subject to constraints on the mean time to false 
alarm and the fraction of time observations are taken before change. Then, motivated by the structure of 
the DE-Shiryaev algorithm, we propose an extension of the CuSum algorithm from [13]. We call this 
extension the DE-CuSum algorithm. We show that the DE-CuSum algorithm inherits the good properties 
of the DE-Shiryaev algorithm. That is, the DE-CuSum algorithm is asymptotically optimal, is easy to 
design, and provides substantial performance improvements over the approach of fractional sampling, 
where the CuSum algorithm is used and observations are skipped based on the outcome of a sequence 
of coin tosses, independent of the observations process. 

The problem of detecting an anomaly in the behavior of an industrial process, under cost considerations, 
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is also considered in the literature of statistical process control. There it is studied under the heading of 



sampling rate control or sampling size control; see [15] and [16] for a detailed survey, and the references 



in (14} for some recent results. However, none of these references study the data-efficient quickest change 



detection problem under the classical quickest change detection setting, as done by us in [ 14 1 and in this 
paper. For a result similar to our work in [ |T4j in a Bayesian setting see 1 17 1. See [ 18 1 and [ 19 1 for other 
interesting formulations of quickest change detection with observation control. 

Since our work in this paper on data-efficient non-Bayesian quickest change detection is motivated by 
our work on data-efficient Bayesian quickest change detection fT4"|, in Section [Til we provide a detailed 



overview of the results from [14|. We also comment on the insights provided by the Bayesian analysis, 



which we use in the development of a theory for the non-Bayesian setting. In Section III Section [TV} and 
Section IV1 we provide details of the minimax formulations, a description of the DE-CuSum algorithm and 



the analysis of the DE-CuSum algorithm, respectively. We provide the numerical results in Section VI 
Table [I] provides a glossary of the terms used in the paper. 



II. Data-Efficient Bayesian Quickest Change Detection 

In this section we review the Bayesian version of the data-efficient quickest change detection we 
studied in (14) . We consider the i.i.d. model, i.e., {X n } is a sequence of random variables, {X n } for 
n < T are i.i.d. with p.d.f. /o, and {X n } for n > F are i.i.d. with p.d.f. f\. We further assume that T is 
geometrically distributed with parameter p: 

F(T = n) = (l-p) n -y 

For data-efficient quickest change detection we consider the following class of control policies. At 
each time n, n > 0, a decision is made as to whether to take or skip the observation at time n + 1. Let 
M n be the indicator random variable such that M n = 1 if X n is used for decision making, and M n = 
otherwise. Thus, M n+ \ is a function of the information available at time n, i.e., 



M, 



n+l 



(fin(In), 



where, cj) n is the control law at time n, and 

In 



M\,. . . , M n , x[ Ml \ . . . , X 



represents the information at time n. Here, X- represents Xj if Mj = 1, otherwise X\ is absent from 
the information vector I n . Also, Iq is an empty set. 



(M„) 
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TABLE I: Glossary 



Symbol 



Definition/Interpretation 



0(1) 

9(c) ~ h(c) 
as c — > c 

Pn (E„) 
Pco (Eoo) 

ess sup X 

D(h II /o) 

«(/o II fx) 

(x)+ 

(xf+ 

M n 



x = o(l) as c — > c , if Ve > 0, 
38 > s.t, |x| < e if |c- c | < <5 
x = 0(1) as c ^ c , if 3e > 0, S > 
s.t., |a;| < e if |c — c | < S 

or 5(c) = /i(c)(l + o(l)) as c — > c 
Probability measure (expectation) 
when the change occurs at time n 
Probability measure (expectation) 
when the change does not occur 
mf{K e M : P(X > if) = 0} 
K-L Divergence between f\ and /o, 
defined as Ex (log 
K-L Divergence between /o and f\, 
defined as E Q 
max{x, 0} 
max{x, —ft} 



his) 



M n = 1 if X n is used for decision making 
Policy for data-efficient quickest 





change detection {r, Mi, ■ 


■■ ,M T } 


ADD(tf) 


En= nr = n) E„ [(r- 


r)+] 


PFA(*) 


E~= p ( r = n ) p n(r < r) 


FAR(*) 


1 

Eoo[t] 




WADD(#) 


sup ess sup E„ [(r — n) + 

n>l 




CADD(*) 


sup E„[t — n|r > n] 

rt>l 




PDC(*) 


limsup„iE„ Efci lM fe 


r > n 
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For time n > 1, based on the information vector I n , a decision is made whether to stop and declare 
change or to continue taking observations. Let r be a stopping time on the information sequence {I n }, 
that is I{ r=n } is a measurable function of I n . Here, Ip represents the indicator of the event F. Thus, a 
policy for data-efficient quickest change detection is ^ = {r, <po, . . . , 4> T -i}- 

Define the average detection delay 

ADD(*) =E[(r-r)+] , 

the probabihty of false alarm 

PFA(^) = P(r < T), 



and the metric for data-efficiency in the Bayesian setting we considered in [14|, the average number of 
observations used before the change point, 

min(r,r— 1) 



ANO(tf) =E 



M n 



n=l 



The objective in [14] is to solve the following optimization problem: 
Problem 1: 

minimize ADD(^), 
subject to PFA(^) < a, (1) 
and ANO(^) < C 

Here, a and C, are given constraints. 

Remark 1: When £ > E[T] — 1, Problem [I] reduces to the classical Bayesian quickest change detection 
problem considered by Shiryaev in Q. 

A. The DE-Shiryaev algorithm 
Define, 

Pn=P(r <n\I n ). 

Then, the two-threshold algorithm from |l4j is: 

Algorithm 1 (DE-Shiryaev: ^(A,B)): Start with po = and use the following control, with B < A, 
for n > 0: 

/ (0 if Pn <B 

M n+ i = <p n (p n ) = < 

[l ifp n >B (2) 

t d = inf {n > 1 : p n > A} . 
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The probability p n is updated using the following recursions: 

Pn if M n+ i 



Pn+1 



p n L(X n +i) 



if M„ 



with p n = p n + (1 -pn)p and L(X n+ i) = /i(X n+ i)// (X n+ i). 




1 



Remark 2: With B = the DE-Shiryaev algorithm reduces to the Shiryaev algorithm from (4j. 

The motivation for this algorithm comes from the fact that p n is a sufficient statistics for a Lagrangian 
relaxation of Problem [T] This relaxed problem can be studied using dynamic programming, and numerical 
studies of the resulting Bellman equation shows that the DE-Shiryaev algorithm is optimal for a wide 



choice of system parameters. For an analytical justification see Section II-B below. 



When Algorithm [T] is employed, the probability p n typically evolves as depicted in Fig. [T] As observed 




Fig. 1: Typical evolution of p n for / ~ J\f(0, 1), /i ~ Af(0.8, 1), and p = 0.01, with thresholds A = 0.9 
and B = 0.2. 



in Fig. [T] the evolution starts with an initial value of po = 0. This is because we have implicitly assumed 
that the probability that the change has already happened even before we start taking observations is 
zero. Also, note that when p n < B, p n increases monotonically. This is because when an observation 
is skipped, p n is updated using the prior on the change point, and as a result the probability that the 
change has already happened increases monotonically. The change is declared at time t d , the first time 
p n crosses the threshold A. 

B. Asymptotic Optimality and trade-off curves 



It is shown in [14] that the PFA and ADD of the DE-Shiryaev algorithm approach that of the Shiryaev 



algorithm as a — > 0. Specifically, the following theorem is proved. 
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Theorem 2.1: If 

0<D(/o||/ 1 )<oo and < D{h \\ f ) < oo, 



and L(X) is non-arithmetic (see |20|), then for a fixed (, the threshold B can be selected such that for 
every A> B, 

MiO(V(A,B)) < C, 

and with B fixed to this value, 

ADD(*(A B)) ~ — J^ff - as a ^ 0. (3) 

£>(/i II /o) + |log(l -p)| 

and 

/ /"°° \ 

(4) 



PFA(#(A, B)) ~ a (J e- x dR{x)^j as a -»• 0. 



Here, i?(cc) is the asymptotic overshoot distribution of the random walk ^fc=i(-kp0fc) + I log(l — p)\), 
when it crosses a large positive boundary under f\. Since, Q and Q are also the performance of the 
Shiryaev algorithm as a — > (5J, the DE-Shiryaev algorithm is asymptotically optimal. 

Remark 3: Equation (|4]) shows that PFA is not a function of the threshold B. In [14|, it is shown that 
as a — > and as p — > 0, ANO is a function of B alone. Thus, for reasonable values of the constraints 
a and /?, the constraints can be met independent of each other. 



Remark 4: The statement of Theorem 2.1 is stronger than the claim that the DE-Shiryaev algorithm 
is asymptotically optimal. This is true because 

PFA(^(A, B)) = E[l - p TD ] < 1 - A. 

Thus, with A = 1 — a, PFA^^, B)) < a, and with B chosen as mentioned in the theorem, Q alone 
establishes the asymptotic optimality of the DE-Shiryaev algorithm. 

Remark 5: Although ([3]) is true for each fixed value of £, as ( becomes smaller, a much smaller value 
of a is needed before the asymptotics 'kick in'. 

Fig. [2] compares the performance of the Shiryaev algorithm, the DE-Shiryaev algorithm and the 
fractional sampling scheme, for £ = 50. In the fractional sampling scheme, the Shiryaev algorithm 
is used and samples are skipped by tossing a biased coin (with probability of success 50/99), without 
looking at the state of the system. When a sample is skipped in the fractional sampling scheme, the 
Shiryaev statistic is updated using the prior on change point. The figure clearly shows a substantial gap 
in performance between the DE-Shiryaev algorithm and the fractional sampling scheme. 

More accurate estimates of the delay and that of ANO are available in p4|. 
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f n (0,l), f n (0.8,1), p=0.01, 50% samples dropped 
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30 
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|log(PFA) | * 

Fig. 2: Comparative performance of schemes for /o ~ A/"(0, 1), f\ ~ AA(0.8, 1), and p = 0.01. 

C. Insights from the Bayesian setting 

We make the following observations on the evolution of the statistic p n in Fig. [T] 

1) Let 

t(B) = inf{n > 1 : p n > B}. 

Then after t(B), the number of samples skipped when p n goes below B is a function of the 
undershoot of p n and the geometric parameter p. If L*(X n ) is defined as 

ifM n = l 
I 1 if M n = 

Then -j-^z- can be shown to be equal to 

l-Pn P(T>n) 
Thus is the average likelihood ratio of all the observations taken till time n, and since there 
is a one-to-one mapping between p n and ^ , we see that the number of samples skipped is a 
function of the likelihood ratio of the observations taken. 

2) When p n crosses B from below, it does so with an overshoot that is bounded by p. This is because 

Pn+l ~ Pn = (1 - Pn)P < P- 

For small values of p, this overshoot is essentially zero, and the evolution of p n is roughly 
statistically independent of its past evolution. Thus, beyond t(B), the evolution of p n can be 
seen as a sequence of two-sided statistically independent tests, each two-sided test being a test 
for sequential hypothesis testing between "Hq = pre-change", and "Hi = post-change". If the 
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decision in the two-sided test is Hq, then samples are skipped depending on the likelihood ratio of 
the observations, and the two-sided test is repeated on the samples beyond the skipped samples. 
The change is declared the first time the decision in a two-sided test is Hi. 
3) Because of the above interpretation of the evolution of the DE-Shiryaev algorithm as a sequence 
of roughly independent two-sided tests, we see that the constraint on the observation cost is met 
by delaying the measurement process on the basis of the prior statistical knowledge of the change 
point, and then beyond t(B), controlling the fraction of time p n is above B, i.e., controlling the 
fraction of time samples are taken. 

These insights will be crucial to the development of the theory for data-efficient quickest change detection 

in the non-Bayesian setting. 

III. Data-Efficient Minimax Formulation 

In the absence of a prior knowledge on the distribution of the change point, as is standard in classical 
quickest change detection literature, we model the change point as an unknown constant 7. As a result, the 
quantities ADD,PFA,ANOin Problem [Tj are not well defined. Thus, we study the data-efficient quickest 
change detection problem in a minimax setting. In this paper we consider two most popular minimax 
formulations: one is due to Pollak J7J and another is due to Lorden (6j. 

We will use the insights from the Bayesian setting of Section [n] to study data-efficient minimax quickest 
change detection. Our development will essentially follow the layout of the Bayesian setting. Specifically, 
we first propose two minimax formulations for data-efficient quickest change detection. Motivated by the 
structure of the DE-Shiryaev algorithm, we then propose an algorithm for data-efficient quickest change 
detection in the minimax settings. This algorithm is a generalized version of the CuSum algorithm fT3| . 
We call this algorithm the DE-CuSum algorithm. We show that the DE-CuSum algorithm is asymptotically 
optimal under both minimax settings. We also show that in the DE-CuSum algorithm, the constraints 
on false alarm and observation cost can be met independent of each other. Finally, we show that we 
can achieve a substantial gain in performance by using the DE-CuSum algorithm as compared to the 
approach of fractional sampling. 

We first propose a metric for data-efficiency in a non-Bayesian setting. In Section [Tl-C| we saw that in 
the DE-Shiryaev algorithm, observation cost constraint is met using an initial wait, and by controlling the 
fraction of time observations are taken, after the initial wait. In the absence of prior statistical knowledge 
on the change point such an initial wait cannot be justified. This motivates us to seek control policies 
that can meet a constraint on the fraction of time observations are taken before change. With M n , I n , 
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r, and ^ as denned earlier in Section [TTJ we propose the following duty cycle based observation cost 
metric, Pre-change Duty Cycle (PDC): 

"n-l 



PDC(^) = limsup -E n 

n n 



(5) 



Clearly, PDC < 1. 

We now discuss why we use limsup rather than sup in defining PDC. In all reasonable policies Mi 
will typically be set to 1. As mentioned earlier, this is because an initial wait cannot be justified without 
a prior statistical knowledge of the change point. As a result, in ([5]), we cannot replace the limsup by 
sup, because the latter would give us a PDC value of 1. Even otherwise, without any prior knowledge 
on the change point, it is reasonable to assume that the value of 7 is large, and hence the PDC metric 
defined in ([5]) is a reasonable metric for our problem. 

For false alarm, we consider the metric used in Q and (7J, the mean time to false alarm or its 
reciprocal, the false alarm rate: 

FAR(*) = =Xr- (6) 

Eqo M 

For delay we consider two possibilities: the minimax setting of Pollak Q where the delay metric is 
the following supremum over time of the conditional dela>{^] 

CADD(*)=sup E„ [r - n|r > n) , (7) 

n 

or the minimax setting of Lorden |6), where the delay metric is the supremum over time of the essential 
supremum of the conditional delay 

WADDO) = supess sup E n [(t - n) + |/ n _i] . (8) 

n 

Note that unlike the delay metric in (6J, WADD in ([8]) is a function of the observation control through 



Ln-l 



, which may not contain the entire set of observations. 



Since, {r = n} belongs to the sigma algebra generated by J n _i, we have 

CADD(^) < WADD(*). 

Our first minimax formulation is the following data-efficient extension of Pollak (71 
'We are only interested in those policies for which the CADD is well defined. 
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Problem 2: 

minimize CADD(^), 

subject to FAR(#) < a, (9) 

and PDC(^) < p. 

Here, < q, /3 < 1 are given constraints. 

We are also interested in the data-efficient extension of the minimax formulation of Lorden (6). 
Problem 3: 

minimize WADD(^), 

subject to FAR(tf) < a, (10) 

and PDC(tf) < /3. 

Here, < a, j3 < 1 are given constraints. 

Remark 6: With (3 = 1, Problem [2] reduces to the minimax formulation of Pollak in |7], and Problem [3] 
reduces to the minimax formulation of Lorden in (6|. 

In p"3| , the following algorithm called the CuSum algorithm is proposed: 

Algorithm 2 fCuSum: ^ c ): Start with Co = 0, and update the statistic C n as 

C n+1 = {C n + \ogL(X n+l )) + , 

where (x) + = max{0,x}. Stop at 

t c = inf{ra > 1 : C n > D}. 

It is shown by Lai in (10] that the CuSum algorithm is asymptotically optimal for both Problem [2] and 
Problem [3] with j3 = 1, as a — ^ (see Section V-B for a precise statement). 



In the following we propose the DE-CuSum algorithm, an extension of the CuSum algorithm for 
the data-efficient setting, and show that it is asymptotically optimal, for each fixed (3, as a — > 0; see 
Section IV-El 

IV. The DE-CuSum algorithm 
We now present the DE-CuSum algorithm. 
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Algorithm 3 (BE - CuSum.- ^ W (D, p, h)): Start with W = and fix p, > 0, D > and h > 0. For 
n > use the following control: 



M, 



n+l 



if W n < 

1 if Wr, > 



n+l 



r w = inf {n > 1 : W n > D} . 
The statistic W n is updated using the following recursions: 

min{Wn + At,0} if M n+1 = 

(W n + log L(X n+1 )) h + if M n+1 = 1 

where (x) h+ = max{x, —h}. 

When h = oo, the DE-CuSum algorithm works as follows. The statistic W n starts at 0, and evolves 
according to the CuSum algorithm till it goes below 0. When W n goes below 0, it does so with an 
undershoot. Beyond this, W n is incremented deterministically (by using the recursion W n+ i = W n + n), 
and observations are skipped till W n crosses from below. As a consequence, the number of observations 
that are skipped is determined by the undershoot (log likelihood ratio of the observations) as well as the 
parameter fi. When W n crosses from below, it is reset to 0. Once W n = 0, the process renews itself 
and continues to evolve this way until W n > D, at which time a change is declared. 

If h < oo, W n is truncated to — h when W n goes below from above. In other words, the undershoot 
is reset to —h if its magnitude is larger than h. A finite value of h guarantees that the number of samples 
skipped is bounded by ^ + 1. This feature will be crucial to the WADD analysis of the DE-CuSum 



algorithm in Section V-D 



If h = 0, the DE-CuSum statistic W n never becomes negative and hence reduces to the CuSum statistic 
and evolves as: Wq = 0, and for n > 0, 

W n+ i = max{0, W n + logL(X n+ i)}. 

Thus, fi is a substitute for the Bayesian prior p that is used in the DE-Shiryaev algorithm described 



in Section II-A But unlike p which represents a prior statistical knowledge of the change point, p is a 



design parameter. An appropriate value of p is selected to meet the constraint on PDC; see Section V-A 
for details. 

The evolution of the DE-CuSum algorithm is plotted in Fig. [3] In analogy with the evolution of the 
DE-Shiryaev algorithm, the DE-CuSum algorithm can also be seen as a sequence of independent two- 



sided tests. In each two-sided test a Sequential Probability Ratio Test (SPRT) |21 1, with log boundaries D 
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f =N(0,1) f =N(0.75,1) [1=0.1 




J 10 20 30 40 50 60 70 80 

T=40 T w 



Fig. 3: Typical evolution of W n for f Q ~ Af(0, 1), /i ~ Af (0.75, 1), r = 40, D = 7, fi = 0.1, with two 
different values of h: h = oo and h = 0.5. When h = 0.5, the undershoots are truncated at —0.5. 

and 0, is used to distinguish between the two hypotheses "Hq = pre-change" and "Hi = post-change". 
If the decision in the SPRT is in favor of Hq, then samples are skipped based on the likelihood ratio of 
all the observations taken in the SPRT. A change is declared the first time the decision in the sequence 
of SPRTs is in favor of H\. If h = 0, no samples are skipped and the DE-CuSum reduces to the CuSum 
algorithm, i.e., to a sequence of SPRTs (also see (20}). 

Unless it is required to have a bound on the maximum number of samples skipped, the DE-CuSum 
algorithm can be controlled by just two-parameters: D and fj,. We will show in the following that these two 
parameters can be selected independent of each other directly from the constraints. That is the threshold 
D can be selected so that FAR < a independent of the value of \i. Also, it is possible to select a value 
of n such that PDC < /3 independent of the choice of D. 

Remark 7: With the way the DE-CuSum algorithm is defined, we will see in the following that it may 
not be possible to meet PDC constraints that are close to 1, with equality. We ignore this issue in the 
rest of the paper, as in many practical settings the preferred value of PDC would be closer to than 1. 
But, we remark that the DE-CuSum algorithm can be easily modified to achieve PDC values that are 
close to 1 by resetting W n to zero if the undershoot is smaller than a pre-designed threshold. 

Remark 8: One can also modify the Shiryaev-Roberts algorithm fl"2| and obtain a two-threshold 
version of it, with an upper threshold used for stopping and a lower threshold used for on-off observation 
control. Also note that the SPRTs of the two-sides tests considered above have a lower threshold of 
0. One can also propose variants of the DE-CuSum algorithm with a negative lower threshold for the 
SPRTs. 
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Remark 9: For the CuSum algorithm, the supremum in ([T]) and ((8) is achieved when the change is 



applied at time n = 1 (see also (|24|)). This is useful from the point of view of simulating the test. However, 
in the data-efficient setting, since the information vector also contains information about missed samples, 
the worst case change point in ^ would depend on the observation control and may not be n = 1. But 
note that in the DE-CuSum algorithm, the test statistic evolves as a Markov process. As a result, the worst 
case usually occurs in the initial slots, before the process hits stationarity. This is useful from the point 
of view of simulating the algorithm. In the analysis of the DE-CuSum algorithm provided in Section [V] 
below, we will see that the WADD of the DE-CuSum algorithm is equal to its delay when change occurs 
at n = 1, plus a constant. Similarly, even if computing the PDC may be a bit difficult using simulations, 
we will provide simple numerically-computable upper bound on the PDC of the DE-CuSum algorithm 
that can be used to ensure that the PDC constraint is satisfied. 



V. Analysis and design of the DE-CuSum algorithm 

The identification/intepretation of the DE-CuSum algorithm as a sequence of two-sided tests will now 
be used in this section to perform its asymptotic analysis. 

Recall that the DE-CuSum algorithm can be seen as a sequence of two sided tests, each two-sided test 
contains an SPRT and a possible sojourn below zero. The length of the latter being dependent on the 
likelihood ratio of the observations. 

Define the following two functions: 

HW k ) = W k + log L(X k+1 ), 

and 

*(W fc ) = W k + ii. 

Using these functions we define the stopping time for an SPRT 

A D = inf{n > 1 : *(W n _i) t [0,D], W = 0}. (11) 

At the stopping time A D for the SPRT, if the statistic W\ B = x < 0, then the time spent below zero is 
equal to T(x, 0), where for x < y 

T(x, y, /j,) = inf{n > 1 : $(W n -i) >y,W = x}, (12) 

with T(0, 0, n) = 0. Note that 

T{x,y,n) = \{y - x)/n~\. (13) 
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We also define the stopping time for the two-sided test 

A D = A D + T((W XD ) h+ , 0, n) V Ad<0} . 
Let A^ be the variable A D when the threshold D = oo. 



(14) 
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Fig. 4: Evolution of W n for f ~ M{0, 1), /i ~ A/"(0.75, 1), and T = 40, with D = 7, h = oo, and 
/x = 0.1. The two-sided tests with distribution of A D are shown in the figure. Also shown are the two 
components of A D : A D and T(x,y). 



To summarize, the variables A D , A D and T(x, y, /x) should be interpreted as follows. The DE-CuSum 
algorithm can be seen as a sequence of two-sided tests, with the stopping time of each two-sided 
test distributed accordingly to the law of A D . Each of the above two-sided tests consists of an SPRT 
with stopping time distributed accordingly to the law of A D , and a sojourn of length T((W\ D ) h+ , 0, n) 
corresponding to the time for which the statistic W n is below 0, provided at the stopping time for the 
SPRT, the accumulated log likelihood is negative, i.e., the event {W\ D < 0} happens. See Fig. [4] 

The CuSum algorithm can also be seen as a sequence of SPRTs, with the stopping time of each SPRT 
distributed according to the law of A D (see [20]). 

We now provide some results on the mean of A D and T(x, y, /x) that will be used in the analysis of 
the DE-CuSum algorithm in Sections |V-A[ |V-C| and |V-D 



If < D(f || /i) < co, then from Corollary 2.4 in |22j, 

EoolAJ < oo, (15) 

and by Wald's lemma 

EoollWOj] = D(f || f x ) Eoq [AJ < oo. (16) 
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Also for h > 



E 00 [|W A ft +|]<E 00 [|W Aoo |]<cx), 



(17) 



where the finiteness follows from (16). 



The lemma below shows that the quantity Eoo [A D | W\ D < 0] is finite for every D and provides a finite 
upper bound to it that is not a function of the threshold D. This result will be used in the PDC analysis 
in Section IV-Al 

Lemma 1: If < D(fo \ \ fi) < oo, then for any D, E OQ [\ D \W\ D < 0] is well defined and finite: 

Eoo [A J 



Koo[A d |W Ad <0] < 



< oo. 



>(L(Xi) < 0) 

Proof: The proof of the first inequality is provided in the appendix. The second inequality is true 



by {[5]> and because P oc (L(Xi) < 0) > 0. ■ 
The following lemma provides upper and lower bounds on E 00 [T((l^x D ) /, ' + , 0, fi)\ W\ B < 0] that are 



not a function of the threshold D. The upper bound will be useful in the FAR analysis in Section V-C 



and the lower bound will be useful in the PDC analysis in Section V-A Define 



-,(00) 



\h+\ 



L(Xt) < 0] 



and 



r (oo) 

•u 



(h,fi) 



h+\ 



HP 00 (L(X 1 )<0) 



+ 1. 



(18) 



(19) 



Lemma 2: If < D(fo \\ fi) < 00 and fi > 0, then 

-,(00) 



< E 00 [r((w AD )' l+ ! o,/x) w Xd <o] 



(20) 



Moreover, T^°\h,fj) < 00, and if h > 0, then Tj°°\h, /i) > 0. 

Proof: The proof is provided in the appendix. ■ 

The next lemma shows that the mean of Ei[T(W r ^ D t ~, 0, W^ D < 0] is finite under Pi and obtains a 
finite upper bound to it that is not a function of D. This result will be used for the CADD and WADD 

+ 1. (21) 



analysis in Section V-D Let 



Eco[|W*+|] 



M Pi <0) 
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Lemma 3: If < D(fo \\ f\) < oo and fi > 0, then 

E 1 [r(Ty A h +,0,/x)|Ty AD <0] < T$-\h,n) < oo. 
Proof: The proof is provided in the appendix. 



A. Meeting the PDC constraint 

In this section we show that the PDC metric is well defined for the DE-CuSum algorithm. In general 
PDC(^ W ) will depend on both D and [i (apart from the obvious dependence on /o and f\). But, we 
show that it is possible to choose a value of \i that ensures that the PDC constraint of (3 can be met 
independent of the choice of D. The latter would be crucial to the asymptotic optimality proof of the 



DE-CuSum algorithm provided later in Section V-E 



Theorem 5.1: For fixed values of D, h, and \i > 0, if < D(/o || /i) < oo, then 
PDC(* W (AM)) = 

^oo[K\W Xd <0] 



(22) 



Eoo[A d |W Ad < 0] +E oo [T((W XD ) h +,0,») W Xd < 0] 
Proof: Consider an alternating renewal process {V n , U n }, i.e, a renewal process with renewal times 
{Vi, Vi + Ui,Vi + Ui + V 2 , ■ ■ ■ }, with {V n } i.i.d. with distribution of A D conditioned on {W\ D < 0}, 



and {U n } i.i.d. with distribution of T((W\ D ) ,0,/i) conditioned on {W\ D < 0}. Thus, 

EcofVi] =E 00 [X d \W Xd <0], 



and 



Eoo[tfi] = EooITCCWab)^^,/*) ^ Ad < 0]. 



Both the means are finite by Lemma [T] and Lemma [2] 

At time n assign a reward of R n = 1 if the renewal cycle in progress has the law of V\, set R n = 
otherwise. Then by renewal reward theorem, 



1, 



n 



Vt-l 

5> 



k=l 



Eoo[^l] 



EooN+EoolC/i] 

On {r w > n}, the total number of observations taken till time n — 1 has the same distribution as the 
total reward for the alternating renewal process defined above. Hence, the expected value of the average 
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reward for both the sequences must have the same limit: 



lim -E r 

n— >oo TL 



n-1 



r w >n 



k=l 



Eoo[A d |W Ad < 0] 



(23) 



Eoo[A d |W Ad < 0] +E oo [T((Wx D ) h+ ,0,v)\W XD < 0] ' 



Remark 10: If h = 0, then E 00 [r((W AD ) fc+ , 0, li)\W Xd < 0] = and we get the PDC of the CuSum 
algorithm that is equal to 1. 



As can be seen from ( [22] ), PDC is a function of D as well as that of h and ii. We now show that 
for any D and h > 0, the DE-CuSum algorithm can be designed to meet any PDC constraint of /3. 
Moreover, for a given h > 0, a value of /i can always be selected such that the PDC constraint of (3 is 
met independent of the choice of D. The latter is convenient not only from a practical point of view, but 



will also help in the asymptotic optimality proof of the DE-CuSum algorithm in Section |V-E 



Theorem 5.2: For the DE-CuSum algorithm, for any choice of D and h > 0, if < D(fo \ \ f\) < oo, 
then we can always choose a value of fi to meet any given PDC constraint of (3. Moreover, for any fixed 
value of h > 0, there exists a value of /u say fi*(h) such that for every D, 



In fact any \i that satisfies 



PDC(tf w (£>,/Afc)) </3. 



E 0O [|Lpfi) h+ | L(X X ) <0]Poo(L(Xi) <0) 2 p 



Eoo[AJ 1-/3' 
can be used as fj,*. 

Proof: Note that Eqo [A d | W\ d < 0] is not affected by the choice of h and fx. Moreover, from Lemma |2] 
and (T8) 

Koo[T((W AD ) h+ ,0^) W Ad <0] 



< 0] 



5 (L(Xi) < 0) 



Thus, for a given L> and h, E 00 [T((H 7 a d ) /i+ , 0, /x)|W Ad < 0] increases as fj, decreases. Hence, PDC 
decreases as ll decreases. Therefore, we can always select a ll small enough so that the PDC is smaller 
than the given constraint of /3. 
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Next, our aim is to find a fx* such that for every D 

Eoo[A d |^ Ad <0] 



</3, 

Koo[A d |W Ad < 0] +E oo [T((Wx D ) h+ ,0, f i*) W Xo < 0] 

Since, PDC increases as EoofAolWAc < 0] increases and E 00 [T((Wa d ) /i+ , 0, ff) W\ D < 0] decreases, we 
have from Lemma [T] and Lemma |2j 

PDC(* W ) < EoofAJ 

E^AJ +t£ oo) (/i^) Poo(L(Xi) < 0) 
Then, the theorem is proved if we select /j, such that the right hand side of the above equation is less 

than (3 or a [i that satisfies 

EooOLpTi^+l L(Xx) < 0] Poo(L(X) < 0) 2 ^ 



< 



Koo[A 



1-/3 



Remark 11: While the existence of /i* proved by Theorem 5.2 above is critical for asymptotic opti 



mality of the DE-CuSum algorithm, the estimate it provides when substituted for fj, in (22i may be a 



bit conservative. In Section V-F we provide a good approximation to PDC that can be used to choose 



the value of fi in practice. In Section VI we provide numerical results showing the accuracy of the 
approximation. 



Remark 12: By Theorem 5.2 for any value of h, we can select a value of \x small enough, so that 
any PDC constraint close to zero can be met with equality. However, meeting the PDC constraint with 
equality may not be possible if (3 is close to 1. This is because if h ^ then 



PDC(^ V 



< 



< 1. 



E oo [A oo ]+Poo(MJ0<0) 
However, as mentioned earlier, for most practical applications (3 will be close to zero than 1, and hence 
this issue will not be encountered. If /3 close to 1 is indeed desired then the DE-CuSum algorithm can 
be easily modified to address this issue (by skipping samples only when the undershoot is larger than a 
pre-designed threshold). 



B. Analysis of the CuSum algorithm 

In the sections to follow, we will express the performance of the DE-CuSum algorithm in terms of the 
performance of the CuSum algorithm. Therefore, in this section we summarize the performance of the 
CuSum algorithm. 
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It is well known (see (6|, (20j, (3j), that 

CADD(^ C ) = WADD(1' C ) = Ei[r c - 1]. (24) 
From |[6j, if < -D(/i || /o) < oo, then Ei[r c ] < oo. Moreover, if {Ai, A2, • • • } are i.i.d. random variables 



each with distribution of A D , then by Wald's lemma (20 1 

" N 



Ei[t c ] =Ei 



Ei[iV] Ei[A D ] 



(25) 



where iV is the number of two-sided tests (SPRTs)-each with distribution of A D -executed before the 
change is declared. 

It is also shown in [6] that < D(f\ || /o) < 00 is also sufficient to guarantee EoofTc] < 00 and 
FAR(^ C ) > 0. Moreover, 



EM =E C 



N 



EoofAT] Eoo [A D ] . 



(26) 



The proof of the following theorem can be found in [6] and [10|. 
Theorem 5.3: If < D(/i || f ) < 00, then with D = log -, 



and as a — > 0, 



FAR(^ C ) < a, 
CADD(^ C ) = WADD(^ C ) = Ei[r c - 1] 



log q| 



Thus, the CuSum algorithm is asymptotically optimal for both Problem [3] and Problem [2] because for any 
stopping time r with FAR(r) < a, 



WADD(r) > CADD(r) > 



log a I 



D(h II /o) 



(27) 



as a — >■ 0. 



C. FAR for the DE-CuSum algorithm 

In this section we characterize the false alarm rate of the DE-CuSum algorithm. The following theorem 
shows that for a fixed D, \i and h, if the DE-CuSum algorithm and the CuSum algorithm are applied 
to the same sequence of random variables, then sample-pathwise, the DE-CuSum statistic W n is always 
below the CuSum statistic C n . Thus, the DE-CuSum algorithm crosses the threshold D only after the 
CuSum algorithm has crossed it. 

Lemma 4: Under any P n , n > 1 and under P^, 

C n > W n . 
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Thus 

r c < r w . 

Proof: This follows directly from the definition of the DE-CuSum algorithm. If a sequence of samples 
causes the statistic of the DE-CuSum algorithm to go above D, then since all the samples are utilized 
in the CuSum algorithm, the same sequence must also cause the CuSum statistic to go above D. ■ 
It follows as a corollary of Lemma |4] that 

The following theorem shows that these quantities are finite and also provides an estimate for FAR(^ W ). 



Theorem 5.4: For any fixed h (including h = oo) and \i > 0, if 

<£>(/„ || /i)<oo and < D(f x || /„) < oo, 



then with D = log -, 



FAR(^ W ) < FAR(* C ) < a. 



Moreover, for any D 



Eoo [r w ] 



Koo[Ad] 



>(W Xd > 0) 



Eoo[A D ] E oo [T((^ Ad ) /i+ ,0, / x) I {Wxd<0} } 



+ 



and as D — > oo, 



= E DO [r c ] + 
FAR(* W ) 



(W Xd > 0) Poo(W Xd > 0) 

E 00 [r((w AD ) ft + ! o )/ /) v Ad<0} ] 



Aw Xd > o) 

EooIA^] 



(28) 



(29) 



FAR(^ C ) Eo^Aj + E 00 [T((iy Aoo )^+, 0,/x)] ' 
where X x is the variable A D with D = oo. The limit in p9] ) is strictly less than 1 if h > 0. 

Proof: For a fixed D, let 2V D be the number of two-sided tests of distribution A D executed before the 
change is declared in the DE-CuSum algorithm. Then, if {Ax, A2, • • • } is a sequence of random variables 
each with distribution of A D , then 



Eoo[r w ] =E C 



Jfc=l 



Because of the renewal nature of the DE-CuSum algorithm, 



Eoo [Ay = Eoo [AT], 
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where N is the number of SPRTs used in the CuSum algorithm. Thus from (26 1, 

Eoo[iV D ] = EoofiV] < Eoofrc] < oo. 



Further from ( 14 1, 



From ( 26 1 again 



EooIAJ = E oo [A D ]+E oo [T((WAj' l+ ,0,/i) V Ad<0} ]. 
Eoo[A D ] < EoofrJ < oo. 



Moreover from Lemma [2] 



E oo [T((W Ad )' 1 +0,/x) V Ad<0} ] 

<E 00 [r((w AD ) /l+ ,o )/ i) | w Ad <o] 

< Tk°°\h,n) < oo. 



Thus, Eoo[A D ] < oo and 



A'd 



,fc=l 



Eoo[iV D ] EooIAj <oo. 



Eoofr,] =E C 

It follows as a corollary of Lemma [4] and Theorem 5.3 that for D = log i, 

FAR(^ W ) < FAR(* C ) < a. 



Since, A D is Geom(P 00 (iy AD > 0)), ((28]) follows from Q and ([2§. 
Further, since E^ [A D ] = Eoo [AT] , we have 

Eqo[t c ] = Eqq [A] EpotAj = Eoo[Ag] 

Eoc[t w ] Eoo[A d ] Eoo[A d ] Eoo[A d ]' 

If 

C = {W n reaches below zero only after touching D}, 
then as D — > 00, Poo(C) — > and since T((W Aoo ) /l+ , 0, fi) and A„ are integrable under 

EocPWaJ^O,/;) ; c] ->0, 

and 

EooIA^ ; C] -^0. 

Thus, as D — >• 00, 

Eqo[t c ] ^ Eqq[AJ = Egp[Aj 

Eoo[r w ] Eoo[AJ E 0O [AJ+E 0O [T((I^ Aoo ) ft +,0,^)]' 



(30) 
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The limit is clearly less than 1 if h > 0. ■ 
Remark 13: Thus, unlike the Bayesian setting where the PFA of the DE-Shiryaev algorithm converges 
to the PFA of the Shiryaev algorithm, here, the FAR of the DE-CuSum algorithm is strictly less than 



the FAR of the CuSum algorithm. Moreover, for large D, the right side of (29 1 is approximately the 



PDC achieved. Thus, ( |29| ) shows that, asymptotically as D — > oo, the ratio of the FARs is approximately 
equal to the PDC. This also shows that one can set the threshold in the DE-CuSum algorithm to a value 
much smaller than D = — to meet the FAR constraint with equality, and as a result get a better delay 



performance. This latter fact will be used in obtaining the numerical results in Section VI 



D. CADD and WADD of the DE-CuSum algorithm 

We now provide expressions for CADD and WADD of the DE-CuSum algorithm The main content 



of Theorem |5.5| and Theorem |5.6| below is, that for each value of D, the CADD and WADD of the DE- 
CuSum algorithm is within a constant of the corresponding performance of the CuSum algorithm. This 
constant is independent of the choice of D, and as a result the delay performances of the two algorithms 
are asymptotically the same. 

The results depend on the following fundamental lemma. The lemma says that when the change 
happens at n = 1, then the average delay of the DE-CuSum algorithm starting with Wq = x > 0, is 
upper bounded by the average delay of the algorithm when Wo = 0, plus a constant. Let 

r w (x) = inf{ra > 1 : W n > D; W = x}. 

Here, W n is the DE-CuSum statistic and evolves according the description of the algorithm in Section 



IV Thus, r w (x) is the first time for the DE-CuSum algorithm to cross D, when starting at Wo = x. 
Clearly, t w (x) = r w if x = 0. 

Lemma 5: Let < D(fi \\ /o) < oo and < x < D. Then, 

Ei[r w (x)] <Ei[T w ] + I$\h,ti), 



where, T£\h,n) is an upper bound to the variable T(x, y) (see pi)). Moreover if h < oo, then 



u 

Ei[t w (x)] <Ex[t w ] + \h/n]. 

Proof: The proof is provided in the appendix. ■ 
We first provide the result for the CADDs of the two algorithms. 
Theorem 5.5: Let 

0<D(/ ||/i)<oo and < D(fi \\ f Q ) < oo. 
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Then, for fixed values of fi > and h, and for each D, 

CADD(^ W ) < CADD(^ C ) + K 1} 
where K\ is a constant not a function of D. Thus as D — > oo, 

CADD(* W ) < CADD(^ C ) + 0(1). 
Proof: If the change happens at n = 1 then 

Ei[r w - l|r w > 1] = Ei[r w ] - 1 < Ei[r w ]. 

Let the change happen at time n > 1. Then on {W„_i > 0}, by Lemma |5j the average delay is 
bounded from above by Ei[t w ] + Tfy(h,fj,), and on {W n _i < 0} the average delay is bounded from 
above by Ei [t w ] plus the maximum possible average time spent by the DE-CuSum statistic below under 
Poo, which is Tjj°\h, //). Thus, from Lemma [2J for n > 1, 

IEn[T w - n|r w > n] 

< (Ei[t w ] Poo(W n -l > 0) 

+ (Ei[r w ] +4 oo) (/ l ,^)) P 0O (W ri -i < 0) 



Thus, for all n. > 1 



K„[r w - n|r w > n] < Ei[r w ] + T^(/i, M ) + T^/i, M ). 



Since, the right hand side of the above equation is not a function of n we have 

CADD(tf w ) < Ei[r w ] +TjpQi,fj) + T$°\h,ii). 



Following Theorem 5.4 and its proof, it is easy to see that 

El[Tw] = El[Tc] + p i( ^ Ad >o) ' 

From Lemma [3] and the fact that Pi(W Ad > 0) > Pi(W Aoc > 0) we have 



Ei[r((W J b ) fe +,0,M) V, D <Q}] < Tfp(h, n) 

i(W Xd >0) ~ ~ Pi(W Xao >oy 



Also from ([24j> we have CADD(* C ) = Ei[r c - 1]. Thus, 
CADD(^ W ) < CADD(^ C ) 
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This proves the theorem. ■ 

Remark 14: Note that the above theorem is valid even if h is not finite. In contrast, as we will see 
below, the WADD(^' W ) = oo if h = oo. As a result, we need a bound on the number of samples skipped 
for finiteness of worst case delay according to the criterion of Lorden. 

We now express the WADD of the DE-CuSum algorithm in terms of the WADD of the CuSum 
algorithm. 

Theorem 5.6: Let 

0<D(/ ||/i)<oo and < D{f x \\ f ) < oo. 
Then, for fixed values of \i > and h < oo, and for each D, 

WADD(^ W ) < WADD(^ C ) + K 2 , 
where K 2 is a constant not a function of D. Thus, as D — > oo, 

WADD(^ W ) < WADD(^ C ) + 0(1). 
Proof: From Lemma [5] it follows that for n > 1 

ess supE n [(r w - n) + |/ n _i] = |7i///| +Ei[r w ]. 
Since the right hand side is not a function of n and it is greater than Ei[r w — 1], we have 

WADD(\Ey) = \h/fx\ +Ei[r w ]. 



Thus, from the proof of theorem above and ( |24| ) 

E 1 [t w ]<E 1 [t c ] + 



i(W Xoo > o) 



and we have 



WADD ^ c + y v r + 1, 



WADD(* W ) < WADD(^ C ) + ^j^'^) + ^ + 2 . 



L (W Aoo > 0) fi 

This proves the theorem. 



The following corollary follows easily from Theorem |5.3| Theorem |5.5| and Theorem [576 
Corollary 1: If < D(f\ \\ fo) < oo and < D(fo \\ f\) < oo, then for fixed values of n and h, 
including the case of h = oo (no truncation), as D — > oo, 

CADD(^ W ) U 



D(hWfo) 
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Moreover, if h < oo, then as D — > oo, 

WADD(^ ° 



D(h\\fo) 

E. Asymptotic optimality of the DE-CuSum algorithm 

We now use the results from the previous sections to show that the DE-CuSum algorithm is asymp- 
totically optimal. 

The following theorem says that for a given PDC constraint of /3, the DE-CuSum algorithm is 
asymptotically optimal for both Problem [3] and Problem [2j as a — > 0, for the following reasons: 

• the PDC of the DE-CuSum algorithm can be designed to meet the constraint independent of the 
choice of D, 

• the CADD and WADD of the DE-CuSum algorithm approaches the corresponding performances of 
the CuSum algorithm, 

• the FAR of the DE-CuSum algorithm is always better than that of the CuSum algorithm, and 

• the CuSum algorithm is asymptotically optimal for both Problem [3] and Problem [2| as a — > 0. 

Theorem 5.7: Let < D(f\ \\ fo) < oo and < D(fo \\ f\) < oo. For a given a, set D = log ^ 
then for any choice of h and fj,, 

FAR(^ W ) < FAR(^ C ) < a. 

For a given (5, and for any given h, it is possible to select \i = p,*(h) such that MD, and hence even with 
D = \og±, 

PDC(* W ) < p. 

Moreover, for each fixed /3, for any h and with fi*(h) selected to meet this PDC constraint of /3, as 

a — > (or D — > oo because D = log ^), 

CADD(tf w (logi ~ CADD(* C ) ~ ^ITTT " 

a Dtfi II Jo) 

Furthermore, if the h chosen above is finite, then 

WADD(* w (logi,/i, //(h))) ~ WADD(* C ) 1o " n 



Proof: The result on FAR follows from Theorem 5.4 The fact that one can select a \i = fJ,*(h) to 
meet the PDC constraint independent of the choice of D follows from Theorem 5.2 Finally, the delay 
asymptotics follow from Theorem 5.5 Theorem 5.6 and Corollary [T] ■ 
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Since, by Theorem 



5.3 



)lf°iri ~> * s tne best P ossl ble asymptotics performance for any given FAR 



D{h || fo) 

constraint of a, the above statement establishes the asymptotic optimality of the DE-CuSum algorithm 
for both Problem [2] and Problem |3] 

F. Design of the DE-CuSum algorithm 

We now discuss how to set the parameters fi, h and D so as to meet a given FAR constraint of a and 
a PDC constraint of (3. 



Theorem 5.4 provides the guideline for choosing D: for any h, /i, 



if D = log - then FAR(^ W ) < a. 

a 



As discussed earlier, Theorem [5T2] provides a conservative estimate of the PDC. For practical purposes, 
we suggest using the following approximation for PDC: 



Eoo[Aj+E 



Eoo[AJ 



PDC« °° L °° J w „ + • (31) 



For large values of D, ( f3"Tj ) will indeed provide a good estimate of the PDC. We note that EoofAJ can 
be computed numerically; see Corollary 2.4 in [22]. 

If h = oo, then using ( [To] ) we can further simplify ( f3T| ) to 

PDC « E ^ A ~] = H (32) 

Eoo[Aj + E " 11 ^ 11 M + ^(/oH/i)' ^ } 

Thus, to ensure PDC < (3, the approximation above suggests selecting /x such that 

M^j^g^C/oll/l). 



In Section VI we provide numerical results that shows that the approximation (32i indeed provides a 



good estimate of the PDC when h = oo. 

VI. Trade-off curves 

The asymptotic optimality of the DE-CuSum algorithm for all /3 does not guarantee good performance 
for moderate values of FAR. In Fig. [5] we plot the trade-off curves for the CuSum algorithm and the 
DE-CuSum algorithm, obtained using simulations. We plot the performance of the DE-CuSum algorithm 
for two different PDC constraints: /3 = 0.5 and (3 = 0.25. For simplicity we restrict ourself to the CADD 
performance for h = oo in this section. Similar performance comparisons can be obtained for CADD 
with h < oo, and for WADD. 
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Each of the curves for the DE-CuSum algorithm in Fig. [5] is obtained in the following way. Five 
different threshold values for D were arbitrarily selected. For each threshold value, a large value of 7 
was chosen, and the DE-CuSum algorithm was simulated and the fraction of time the observations are 
taken before change was computed. Specifically, 7 was increased in the multiples of 100 and an estimate 
of the PDC was obtained by Monte Carlo Simulations. The value of \i was so chosen that the PDC 
value obtained in simulations was slightly below the constraint (3 = 0.5 or 0.25. For this value of fi and 
for the chosen threshold, the FAR was computed by selecting the change time to be 7 = 00 (generating 
random numbers from /q ~ A/"(0, 1)). The CADD was then computed for the above choice of \i and 
D by varying the value of 7 from 1,2,... and recording the maximum of the conditional delay. The 
maximum was achieved in the first five slots. 

As can be seen from the figure, a PDC of 0.5 (using only 50% of the samples in the long run) can 
be achieved using the DE-CuSum algorithm with a small penalty on the delay. If we wish to achieve a 
PDC of 0.25, then we have to incur a significant penalty (of approximately 6 slots in Fig. [5]). But, note 
that the difference of delay with the CuSum algorithm remains fixed as FAR — > 0. This is due to the 



result reported in Theorem 5.5 and this is precisely the reason the DE-CuSum algorithm is asymptotic 
optimal. The trade-off between CADD and FAR is a function of the K-L divergence between the pdf's 
fi and fo: the larger the K-L divergence the more is the fraction of samples that can dropped for a given 
loss in delay performance. 

In Fig. [6] we compare the performance of the DE-CuSum algorithm with the fraction sampling scheme, 
in which, to achieve a PDC of f3, the CuSum algorithm is employed, and a sample is chosen with 
probability /3 for decision making. Note that this scheme skips samples without exploiting any knowledge 
about the state of the system. As seen in Fig. [6] the DE-CuSum algorithm performs considerably better 
than the fractional sampling scheme. Thus, the trade-off curves show that the DE-CuSum algorithm has 
good performance even for moderate FAR, when the PDC constraint is moderate. 

We now provide numerical results that shows that (|32]) provides a good estimate for the PDC. We 



use the following parameters: /o ~ AA(0, 1), /1 ~ AA(0.75, 1) and set h = 00. In Table Ha we fix the 
value of n and vary D and compare the PDC obtained using simulations and the one obtained using 
(32 1, that is using the approximation PDC « fi+D(f \\fi) ' ^ e see tnat ^ e a PP rox i mat i° n becomes more 
accurate as D increases. We also note that the PDC obtained using simulations does not converge to 
fj,+D(f \\fi) ' even as ® becomes large, because of the effect of the presence of a ceiling function in the 



PDC expression; see (13 1 and (22 1. 



In Table lib we next fix a large value of D, specifically D = 6, for which the PDC approximation is 
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34 




10 .5 



log (FAR) 



Fig. 5: Trade-off curves for the DE-CuSum algorithm for PDC = 0.25,0.5, with f ~ JV(0, 1) and 
/i~JV(0.75,1). 



f fl = N(0,1) 

- Fractional Sampling 

-DE-CuSum 

- CuSum 



f = NfO.75,1) 




log (FAR) 



Fig. 6: Comparative performance of the DE-CuSum algorithm with the CuSum algorithm and the 
fractional-sampling scheme: PDC = 0.5, with f ~ Af(Q, 1) and /i ~ A/"(0.75, 1). 



most accurate in Table Ha and check the accuracy of the approximation j^pzW^ by varying /i. We 
see in the table that the approximation is more accurate for small values of \i. This is due to the fact 
that the effect of the ceiling function in the PDC ( p"3| ), ([22]) is negligible when \x is small. 

VII. Conclusions and future work 

We proposed two minimax formulations for data-efficient non-Bayesian quickest change detection, 
that are extensions of the standard minimax formulations in (6j and |7[ to the data-efficient setting. 
We proposed an algorithm called the DE-CuSum algorithm, that is a modified version of the CuSum 



algorithm from [13|, and showed that it is asymptotically optimal for both the minimax formulations we 



proposed, as the false alarm rate goes to zero. 
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PDC 


D 




Simulations 


Approx l|32|( 

M+-D(/oll/l) 


1 


0.1 


0.16 


0.26 


2 


0.1 


0.20 


0.26 


3 


0.1 


0.22 


0.26 


4 


0.1 


0.238 


0.26 


6 


0.1 


0.248 


0.26 



(a) Fixed /i 







PDC 


D 




Simulations 


Approx {32) 

M 

U + D( fnl ) 


6 


0.01 


0.033 


0.034 


6 


0.05 


0.145 


0.151 


6 


0.2 


0.37 


0.41 


6 


0.3 


0.46 


0.51 


6 


0.4 


0.51 


0.58 


6 


0.6 


0.58 


0.68 



(b) Fixed D 



TABLE II: Comparison of PDC obtained using simulations with the approximation ( [32] ) for /o ~ A/"(0, 1), 
fx ~ AA(0.75, 1) and ft = oo. 



We discussed that, like the CuSum algorithm, the DE-CuSum algorithm can also be seen as a sequence 
of SPRTs, with the difference that each SPRT is now followed by a 'sleep' time, the duration of which 
is a function of the accumulated log likelihood of the observations taken in the SPRT preceding it. This 
similarity was exploited to analyze the performance of the DE-CuSum algorithm using standard renewal 
theory tools, and also to show its asymptotic optimality. We also showed in our numerical results that 
the DE-CuSum algorithm has good trade-off curves and provides substantial benefits over the approach 
of fractional sampling. The techniques developed in this paper and the insights obtained can be used to 



study data-efficient quickest change detection in sensor networks. See [23] for some preliminary results. 



APPENDIX 

Proof of Lemma^ If < D(f \\ fi) < oo, then Eo^AJ < oo. Thus, Poo^ < oo) = 1. Choose 
an arbitrary D, and partition {A^ < oo} in to three events: 

A = {X ao < oo}n{L(Xi) < 0}, 

B = {A^ < oo} n {L(Xi) > 0} n {W n never crosses D}, 

C = (AuB)'. 

Then, clearly 

Poo(A)=F 00 (L(X 1 ) < 0), 
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and 

Poo (A UB) = Poo(W Xd <0) 

= PooCL(Xl) < 0) > 0. 

Thus, Eoo[A D W\ D < 0] is well defined and 

Eoo[AJ yE^X^-AuB] 

yE^X^AuB] F^A) 

= Eqo [A d |Wa d < 0] F OQ (L(X 1 ) < 0). 
This proves the lemma because F 0O (L(Xi) < 0) > 0. 

Proof of Lemma^j Since T(x,y,[i) = \\y — we have 

\v — x\ m , . \y — x\ 

11 ' <T(x,y,») < 1? 1 +1. 

H n 

We will use this simple inequality to obtain the upper and lower bounds. 

We first obtain the upper bound. Clearly, 

Eoo[|<+| W Xd <0] 



E 0Q [T(Wj+ 0,/i) W Xd <0] < 



+ 1. 



An upper bound for the right hand side of the above equation is easily obtained. First note that from ( [17] ) 

Eoo[|W£|] < EooQWaJ] < oo. 

Thus, from the notation introduced in the proof of Lemma [T] above 

-x U .. A j;^u6] 

1 AUB] Poo(^) 
W Xd <0] Poc(£(Xi) < 0). 



>Eoo[|WX 



EocOWi+l 



This completes the proof for the upper bound. 
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For the lower bound we have 

rh+ 



Eoc[T(Wt D + ,0,/i) W Xd <0] 



> 



W Xd < 0] 



> 



E oo [\W^\;{L(X 1 )<0} W Xd <0] 



EoollUXtf+l L(X 1 )<0] 



< 0) 



Proof of Lemma ffi First note that 

Pi(W Ad <0) >F 1 (L(X 1 ) <0) >0. 

Thus, Ei[T((W\ D ) h+ , 0,n)\W\ D < 0] is well defined. Also using the inequality on T(x,y,fj,) from 
Lemma [2] we have 

' W Xd < 0] 



Ei[T(Wj+ 0,m) ^ Ad <0]< 



+ 1 



(33) 



We now get an upper bound on the right hand side of the above equation. By Wald's likelihood ratio 



identity (20( and (17 1, 



ExOWj+l; W Xoo <0) 



Ei[|W A h +| ; A„ < oo] 
K oo[\W^\l[L(X k ) ; < oo] 



fc=i 



= E 00 [|W A h +| e 1 ^ ; W Aoo <0] 

^EooO^+l] < E^flWU] < oo. 
Using again the notation introduced in the proof of Lemma [T] we have 

Ei[|w£| ; W Xoo <0] 

>Ei[|Wj^| ; ({W Xao <0})n(AuB)} 

= Ei[|w£+| ; AuB] 

> Ei[|W^+| | AuB] Pi(A) 

= Ri[\Wg\ | W Xd < 0] Pi(L(Xi) < 0). 



(34) 



(35) 
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Thus from ([33), (|34b, and d35 



E 1 [T(^+,0,//)|W Ad <0] 



W Ad < 0] 

+ 1 



Ei[|^ + | ; iy A < 0] 

< — - — — + 1 

" ^Pi(L(Ii)<0) 

Eoo[|^a + |] 

< ^2L^ L 1 

" ^Pi(L(JTi) <0) T 

< oo. 



This proves the lemma. 

Proof of Lemma [5[ Let 



t c (x) = inf{n > 1 : C n > D; C = x}. 



Here, C n is the CuSum statistic and evolves according the description of the algorithm in Algorithm [2] 
Thus, t c (x) is the first time for the CuSum algorithm to cross D, when starting at Co = x. Clearly, 
t c (x) = r c if x = 0. It is easy to see by sample path wise arguments that 

Ei[r c (x)] <Ei[r c ]. 

The proof depends on the above inequality. 

Let A x be the event that the CuSum statistic, starting with Co = x, touches zero before crossing the 
upper threshold D. Let q x = Pi(A x )- Then, 

Ei[r c (x)] = EifrcCz);^] +E 1 [r c (x);A' x ] < Ei[r c ]. 

Note that 

E 1 [t c (x);A' x ]=E 1 [t v/ (x);A' x }. 

We call this common constant ti. Also note that on A x , the average time taken to reach is the same 
for both the CuSum and the DE-CuSum algorithm. We call this common average conditional delay by 
t.2. Thus, 

Ei[r c (a;)] = (t x )(l - q x ) + q x (t 2 + Ei[r c ]) < Ei[r c ]. 

The equality in the above equation is true because, once the DE-CuSum statistic reaches zero, it is reset 
to zero and the average delay that point onwards is Ei [r c ] . 
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Then for any t3 > Ei [t c ] we have 

(ti)(l-g s ) + g x (t2 + t 3 ) < t 3 . 

This is because for t 3 > Ei[r c ] 

(ti)(l-?x) + 

< 
< 

It is easy to see that 

Ei[r w (x)] < (t 1 )(l-q x ) + q x (t 2 +T^\h,fi)+E 1 [T w ]). 

This is because on A x , the average delay of the DE-CuSum algorithm is the average time to reach 0, 
which is t.2, plus the average time spent below due to the undershoot, which is bounded from above 
by T{j\h,n), plus the average delay after the sojourn below 0, which is Ei[r w ]. The latter is due to the 
renewal nature of the DE-CuSum algorithm. Since Ty (h, fJ*) + Ei[t w ] > Ei[t c ], the first part of lemma 
is proved if we set t 3 = T^\h,y) + Ei[r w ]. 

For the second part, note that T^\h, fi) < \h/fi\. ■ 
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